生成对抗网络(GAN)是能够合成数据样本的强大模型,与真实数据的分布非常相似,但是由于所谓的模式崩溃现象在gans中观察到了这些生成样品的多样性受到限制。尤其容易崩溃的是有条件的gan,它们倾向于忽略输入噪声矢量并专注于条件信息。提议减轻这种限制的最新方法增加了生成的样品的多样性,但是当需要样品相似性时,它们会降低模型的性能。为了解决这一缺点,我们提出了一种新颖的方法,可以选择性地增加GAN生成样品的多样性。通过在训练损失功能中添加简单但有效的正则化,我们鼓励发电机发现与不同输出相关的输入的新数据模式,同时为其余的输出生成一致的样本。更确切地说,我们最大化生成的图像与输入潜在向量之间的距离之比,根据给定条件输入的样品的多样性缩放效果。我们在合成基准测试中显示了我们方法的优势,以及在CERN LHC的Alice实验零度量热计中模拟数据的现实情况。
translated by 谷歌翻译
我们引入了一种新的自动评估方法,用于说话者相似性评估,这与人类感知得分一致。现代神经文本到语音模型需要大量的干净训练数据,这就是为什么许多解决方案从单个扬声器模型转换为在许多不同扬声器的示例中训练的解决方案的原因。多扬声器模型带来了新的可能性,例如更快地创建新声音,也是一个新问题 - 扬声器泄漏,其中合成示例的扬声器身份可能与目标扬声器的示例不符。当前,发现此问题的唯一方法是通过昂贵的感知评估。在这项工作中,我们提出了一种评估说话者相似性的自动方法。为此,我们扩展了有关说话者验证系统的最新工作,并评估不同的指标和说话者嵌入模型如何以隐藏的参考和锚(Mushra)分数反映多个刺激。我们的实验表明,我们可以训练一个模型来预测扬声器嵌入的扬声器相似性,其精度为0.96的扬声器嵌入,并且在话语级别上最高0.78 Pearson分数。
translated by 谷歌翻译
Transfer learning is a popular technique for improving the performance of neural networks. However, existing methods are limited to transferring parameters between networks with same architectures. We present a method for transferring parameters between neural networks with different architectures. Our method, called DPIAT, uses dynamic programming to match blocks and layers between architectures and transfer parameters efficiently. Compared to existing parameter prediction and random initialization methods, it significantly improves training efficiency and validation accuracy. In experiments on ImageNet, our method improved validation accuracy by an average of 1.6 times after 50 epochs of training. DPIAT allows both researchers and neural architecture search systems to modify trained networks and reuse knowledge, avoiding the need for retraining from scratch. We also introduce a network architecture similarity measure, enabling users to choose the best source network without any training.
translated by 谷歌翻译
Music discovery services let users identify songs from short mobile recordings. These solutions are often based on Audio Fingerprinting, and rely more specifically on the extraction of spectral peaks in order to be robust to a number of distortions. Few works have been done to study the robustness of these algorithms to background noise captured in real environments. In particular, AFP systems still struggle when the signal to noise ratio is low, i.e when the background noise is strong. In this project, we tackle this problematic with Deep Learning. We test a new hybrid strategy which consists of inserting a denoising DL model in front of a peak-based AFP algorithm. We simulate noisy music recordings using a realistic data augmentation pipeline, and train a DL model to denoise them. The denoising model limits the impact of background noise on the AFP system's extracted peaks, improving its robustness to noise. We further propose a novel loss function to adapt the DL model to the considered AFP system, increasing its precision in terms of retrieved spectral peaks. To the best of our knowledge, this hybrid strategy has not been tested before.
translated by 谷歌翻译
This paper presents a method for detection and recognition of traffic signs based on information extracted from an event camera. The solution used a FireNet deep convolutional neural network to reconstruct events into greyscale frames. Two YOLOv4 network models were trained, one based on greyscale images and the other on colour images. The best result was achieved for the model trained on the basis of greyscale images, achieving an efficiency of 87.03%.
translated by 谷歌翻译
This paper proposes the use of an event camera as a component of a vision system that enables counting of fast-moving objects - in this case, falling corn grains. These type of cameras transmit information about the change in brightness of individual pixels and are characterised by low latency, no motion blur, correct operation in different lighting conditions, as well as very low power consumption. The proposed counting algorithm processes events in real time. The operation of the solution was demonstrated on a stand consisting of a chute with a vibrating feeder, which allowed the number of grains falling to be adjusted. The objective of the control system with a PID controller was to maintain a constant average number of falling objects. The proposed solution was subjected to a series of tests to determine the correctness of the developed method operation. On their basis, the validity of using an event camera to count small, fast-moving objects and the associated wide range of potential industrial applications can be confirmed.
translated by 谷歌翻译
As Artificial and Robotic Systems are increasingly deployed and relied upon for real-world applications, it is important that they exhibit the ability to continually learn and adapt in dynamically-changing environments, becoming Lifelong Learning Machines. Continual/lifelong learning (LL) involves minimizing catastrophic forgetting of old tasks while maximizing a model's capability to learn new tasks. This paper addresses the challenging lifelong reinforcement learning (L2RL) setting. Pushing the state-of-the-art forward in L2RL and making L2RL useful for practical applications requires more than developing individual L2RL algorithms; it requires making progress at the systems-level, especially research into the non-trivial problem of how to integrate multiple L2RL algorithms into a common framework. In this paper, we introduce the Lifelong Reinforcement Learning Components Framework (L2RLCF), which standardizes L2RL systems and assimilates different continual learning components (each addressing different aspects of the lifelong learning problem) into a unified system. As an instantiation of L2RLCF, we develop a standard API allowing easy integration of novel lifelong learning components. We describe a case study that demonstrates how multiple independently-developed LL components can be integrated into a single realized system. We also introduce an evaluation environment in order to measure the effect of combining various system components. Our evaluation environment employs different LL scenarios (sequences of tasks) consisting of Starcraft-2 minigames and allows for the fair, comprehensive, and quantitative comparison of different combinations of components within a challenging common evaluation environment.
translated by 谷歌翻译
CNN-based surrogates have become prevalent in scientific applications to replace conventional time-consuming physical approaches. Although these surrogates can yield satisfactory results with significantly lower computation costs over small training datasets, our benchmarking results show that data-loading overhead becomes the major performance bottleneck when training surrogates with large datasets. In practice, surrogates are usually trained with high-resolution scientific data, which can easily reach the terabyte scale. Several state-of-the-art data loaders are proposed to improve the loading throughput in general CNN training; however, they are sub-optimal when applied to the surrogate training. In this work, we propose SOLAR, a surrogate data loader, that can ultimately increase loading throughput during the training. It leverages our three key observations during the benchmarking and contains three novel designs. Specifically, SOLAR first generates a pre-determined shuffled index list and accordingly optimizes the global access order and the buffer eviction scheme to maximize the data reuse and the buffer hit rate. It then proposes a tradeoff between lightweight computational imbalance and heavyweight loading workload imbalance to speed up the overall training. It finally optimizes its data access pattern with HDF5 to achieve a better parallel I/O throughput. Our evaluation with three scientific surrogates and 32 GPUs illustrates that SOLAR can achieve up to 24.4X speedup over PyTorch Data Loader and 3.52X speedup over state-of-the-art data loaders.
translated by 谷歌翻译
大多数人工智能(AI)研究都集中在高收入国家,其中成像数据,IT基础设施和临床专业知识丰富。但是,在需要医学成像的有限资源环境中取得了较慢的进步。例如,在撒哈拉以南非洲,由于获得产前筛查的机会有限,围产期死亡率的率很高。在这些国家,可以实施AI模型,以帮助临床医生获得胎儿超声平面以诊断胎儿异常。到目前为止,已经提出了深度学习模型来识别标准的胎儿平面,但是没有证据表明它们能够概括获得高端超声设备和数据的中心。这项工作研究了不同的策略,以减少在高资源临床中心训练并转移到新的低资源中心的胎儿平面分类模型的域转移效果。为此,首先在丹麦的一个新中心对1,008例患者的新中心进行评估,接受了1,008名患者的新中心,后来对五个非洲中心(埃及,阿尔及利亚,乌干达,加纳和马拉维进行了相同的表现),首先在丹麦的一个新中心进行评估。 )每个患者有25名。结果表明,转移学习方法可以是将小型非洲样本与发达国家现有的大规模数据库相结合的解决方案。特别是,该模型可以通过将召回率提高到0.92 \ pm 0.04 $,同时又可以维持高精度。该框架显示了在临床中心构建可概括的新AI模型的希望,该模型在具有挑战性和异质条件下获得的数据有限,并呼吁进行进一步的研究,以开发用于资源较少的国家 /地区的AI可用性的新解决方案。
translated by 谷歌翻译
本文介绍了亚当·米基维奇大学(Adam Mickiewicz University)(AMU)提交的《 WMT 2022一般MT任务》的踪迹。我们参加了乌克兰$ \ leftrightarrow $捷克翻译指示。这些系统是基于变压器(大)体系结构的四个模型的加权合奏。模型使用源因素来利用输入中存在的命名实体的信息。合奏中的每个模型仅使用共享任务组织者提供的数据培训。一种嘈杂的反向翻译技术用于增强培训语料库。合奏中的模型之一是文档级模型,该模型在平行和合成的更长序列上训练。在句子级的解码过程中,集合生成了N最佳列表。 n-最佳列表与单个文档级模型生成的n-最佳列表合并,该列表一次翻译了多个句子。最后,使用现有的质量估计模型和最小贝叶斯风险解码来重新列出N最好的列表,因此根据彗星评估指标选择了最佳假设。根据自动评估结果,我们的系统在两个翻译方向上排名第一。
translated by 谷歌翻译